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Attorney Docket 1471/108 
AUTOMATIC GENOTYPE DETERMINATION 
Cross Reference to Related Applications 
This application is a continuation of application 
serial no. 08/362,266, filed December 22, 1994, which is a 
continuation in part of application serial no. 08/173,173, 
filed December 23, 1993, which is for an invention entitled 
"Automatic Genotype Determination, " by Stephen E. Lincoln 
and Michael P. Knapp. This immediate parent application is a 
continuation in part of application serial no. 07/775,786, 
filed October 11, 1991, for an invention entitled "Nucleic 
Acid Typing by Polymerase Extension of Oligonucleotides 
using Terminator Mixtures," by P. Goelet, M. Knapp, and S. 
Anderson, which in turn is a continuation in part of 
application serial no. 07/664,837, filed March 5, 1991. 
Immediate parent application serial no. 08/173,173 is also a 
continuation in part of application serial no. 08/162,397, 
filed December 6, 1993, for an invention entitled "Method 
for Immobilization of Nucleic Acid Molecules" by T. 
Nikiforov and M. Knapp, and of application serial no. 
08/155,746, filed November 23, 1993, for an invention 
entitled "Method for Generating Single-Stranded DNA 
Molecules" by T. Nikiforov and M. Knapp, and of application 
serial no. 08/145,145, filed November 3, 1993, for an 
invention entitled "Single Nucleotide Polymorphisms and 
their use in Genetic Analysis" by M. Knapp and P. Goelet. 
All of these related applications are hereby incorporated 
herein by reference . 



Technical Field 
The present invention relates to the methods and 
devices for determining the genotype at a locus within 
genetic material. 

SiHtimarv of the Invention 
The present invention provides in one embodiment a 
method of determining the genotype at a locus within genetic 
material obtained from a biological sample. In accordance 
with this method, the material is reacted at the locus to 
produce a first reaction value indicative of the presence of 
a given allele at the locus. There is formed a data set. 
including the first reaction value. There is also 
established a set of one or more probability distributions; 
these distributions associate hypothetical reaction values 
with corresponding probabilities for each genotype of 
interest at the locus. The first reaction value is applied 
to each probability distribution to determine a measure of 
the conditional probability of each genotype of interest at 
the locus . The genotype is then determined based on these 
measures . 

In accordance with a further embodiment of this method, 
the material at the locus is subject to a second reaction to 
produce a second reaction value independently indicative of 
the presence of a second allele at the locus. A second data 
set is formed and the second reaction value is included in 
the second data set. Each probability distribution 
associates a hypothetical pair of first and second reaction 



values with a single probability of each genotype of 
interest . The first data set includes other reaction values 
obtained under conditions comparable to those under which 
the first reaction value was produced, and the second data 
set includes other reaction values obtained under conditions 
comparable to those under which the second reaction value 
was produced. Where, for example, there are two alleles of 
interest,- the first reaction may be an assay for one allele 
and the second reaction may be a distinct assay for the 
other allele. The first and second data sets may include 
reaction values for the first and second reactions 
respectively, run under comparable conditions on other 
samples with respect to the same locus. Alternatively, or in 
addition, the data sets may include reaction values for 
reactions run under comparable conditions with respect to 
different loci within the same sample. 

In accordance with a further embodiment, the 
probability distributions may be determined iteratively. In 
this embodiment, each probability distribution is initially 
estimated. Each initial probability distribution is used to 
determine initial genotype probabilities using the reaction 
values in the data sets . The resulting data are then used to 
modify the initial probability distribution, so that the 
modified distribution more accurately reflects the reaction 
values in the data set. This procedure may be iterated a 
desired number of times to improve the probability 
distribution. In practice, we have generally found that a 
single iteration is sufficient. 
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5 The foregoing methods have been employed with success 

for automatic genotype determination .based on assays using 
genetic bit analysis (GBA) . In such a case, each allele may 
typically be a single specific nucleotide. In accordance 
with GBA, a reaction is designed to produce a value that is 
10 indicative of the presence of a specific allele at the locus 

within the genetic material . In GBA, the approach is 
typically to hybridize a specific oligonucleotide to the 
genetic material at the locus immediately adjacent to the 
.Ji nucleotide being interrogated. Next, DNA polymerase is 

[X^ applied in the presence of differentially labelled 

dideoxynucleoside . triphosphates . The read-out steps detect 
SI the presence of one or more of the labels which have become 

covalently attached to the 3' end of the oligonucleotide. 
W Details are provided in Theo R. Nikiforov et al . "Genetic 

|=i2 0 Bit Analysis, a solid phase method for typing single 

nucleotide polymorphisms," 22 Nucleic Acids Research , No. 
O 20, 4167-4175 (1994), which is hereby incorporated herein by 

reference. However, the present invention is also applicable 
to other reaction systems for allele deteirmination, such as 
25 allele-specif ic hybridization (ASH) , sequencing by 

hybridization (CBH) , oligonucleotide ligase assay (OLA) , and 
allele-specif ic amplification, using either the ligase chain 
reaction (LCR) or the polymerase chain reactions (PCR) . The 
alleles assayed may be defined, for example, by a single 
3 0 nucleotide, a pair of nucleotides, a restriction site, or 

{at least in part) by its length in nucleotides. 

In another eitibodiment of the invention, there is 
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5 provided a method of determining the genotype of a subject 

by reacting genetic material taken from the subject at 
selected loci. In this embodiment, each locus may be an 
identified single nucleotide or group of nucleotides, and 
there is produced with respect to each of the selected loci 
10 a reaction value indicative of the presence of a given 

allele at each of the selected loci. These reaction values 
are used to determine the genotype of the subject or 
alternatively a DNA sequence associated with a specific 
region of genetic material of the subject. (Indeed a set of 
Hs genotypes for selected proximal loci may be used to specify 

W a sequence of the genetic material.), In further embodiments, 

SI the loci are selected to provide one or more types of 

7" information concerning the subject, including inheritance of 

H a- trait, parentage, identity, and matching tissue with that 

1=10 of a donor. Alternatively, the loci may be spaced 

Pl throughout the entire genome of subject to assist in 

w characterizing the genome of the species of the subject. 

In a further embodiment of the invention^ there is 
provided a device for determining the genotype at a locus 
25 within genetic material obtained from a subject. The device 

of this embodiment has a reaction value generation 
arrangement for producing a first physical state, 
quantifiable as a first reaction value, indicative of the 
presence of a given allele at the locus, the value 
30 associated with reaction of the material at the locus. The 

device also has a storage arrangement for storing a data set 
including the first reaction value and other reaction values 



obtained under comparable conditions . A distribution 
establishment arrangement establishes a set of probability 
distributions, including at least one distribution, 
associating hypothetical reaction values with corresponding 
probabilities for each genotype of interest at the locus. A 
genotype calculation arrangement applies the first reaction 
value to each pertinent probability distribution to 
determine the conditional probability of each genotype of 
interest at the locus . A genotype determination arrangement 
determines the genotype based on data from the genotype 
calculation arrangement . 

In a further embodiment, th.e device may determine the 
genotype at selected loci. In this embodiment, the reaction 
generation arrangement can produce a reaction value 
indicative of the presence of a given allele at each of the 
selected loci and the data set includes reaction values 
obtained with respect to each of the selected loci. The 
genotype calculation arrangement applies reaction values 
obtained with respect to each of th.e selected loci to each 
pertinent probability distribution. 

In another further embodiment, the device may determine 
the genotype at a locus within genetic material from each of 
a plurality of samples. In this embodiment, the reaction 
generation arrangement can produce a reaction value 
indicative of the presence of a given allele at the locus of 
material obtained from each sample and the data set includes 
reaction values obtained with respect to each sample. The 
genotype calculation arrangement applies reaction values 



obtained with respect to each sample to each pertinent 
probability distribution. 

In each of these embodiments the reaction value 
generation arrangement may also include an arrangement for 
producing a second reaction value, independently indicative 
of the presence of a second allele at the locus. The storage 
arrangement then includes a provision for storing the second 
reaction value and other reaction values obtained under 
comparable conditions . The genotype calculation arrangement 
applies the first and second reaction values to each 
pertinent probability distribution to determine the 
probability of each genotype of interest at the locus. Each 
probability distribution may be of the type associating a 
hypothetical pair of first and second reaction values with a 
single probability of each genotype of interest. The locus 
may be a single nucleotide, and the reaction value 
generation arrangement may include an optical transducer to 
read reaction results and may determine, on a substantially 
concurrent basis, the reaction values with respect to each 
sample . 

The distribution establishment arrangement may be 
configured to assign an initial probability distribution to 
the data set that would associate hypothetical reaction 
values with corresponding probabilities for each genotype of 
interest at the locus . The distribution establishment 
arrangement then invokes the genotype calculation means to 
use each initial probability distribution to determine 
initial conditional probabilities for a genotype of interest 
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5 at the locus. Thereafter the distribution establishment 

arrangement modifies each initial probability distribution, 
so that each modified distribution more accurately reflects 
the reaction values stored in the storage means. 

The term "reaction value" as used in this description 
10 and the following claims may refer either to a single 

numerical value or to a collection of numbers associated 
with a physical state produced by the reaction. In the GBA 
method described in the Nikiforov article referred to above, 
A e.g. , optical signals are produced that may be read as a 

15 single numerical value. Alternatively, e.g. , an optical 

V signal may be simplified over time, and the reaction value 

may be the collection of samples of such a signal. It is 
r""' also possible to form a scanned image, of one or a series of 

M optical signals generated by GBA or other reaction methods, 

and to digitize this image, so that a collection of pixel 
pi values" in all or a portion of the image constitutes a 

CI reaction value. 

Brief Description of the Drawings 

25 

The foregoing aspects of the invention will be more 
readily understood by reference to the following detailed 
description, taken with respect to the following drawings, 
in which: 

30 Fig. 1 is a diagram of a device in accordance with a 

preferred embodiment of the invention; 

Fig. 2 is a diagram of the logical flow in accordance 



5 with the embodiment of Fig. 1; 

Fig. 3 is a graph of moineric reaction values (data) 
generated by the embodiment of Fig. 1 as well as the 
genotype determinations made by the embodiment from these 
data ; and 

10 Figs. 4-7 show probability distributions derived by the 

embodiment of Fig. 1 for three genotypes of interest (AA, 
AT, and TT) and a failure mode at a locus. 

Fig. 8 is an example of the output of the device in 
3 Fig- 1. 

ni Detailed Description of Specific Embodiments 

3""' The invention provides in preferred embodiments a 

M method and device for genotype determination using genetic 

|=s20 marker systems that produce allele-specif ic quantitative 

3J signals. An embodiment uses computer processing, employing 

Ci computer software we developed and call "GetGenos", of data 

produced by a device we also developed to produce GBA data . 

The device achieves, among other things, the following: 
25 • Fully automatic genotype deteimination from 

quantitative data. Off-line analysis of data pools is 

intended, although the software is fast enough to use 

interactively. 

• Ability to examine many allele tests per DMA sample 
3 0 simultaneously. One genotype and confidence measure are 

produced from these data. 

• A true probabilistic confidence measure {a LOD 
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5 score), properly calibrated, is produced for each genotype. 

• Use of robust statistical methods: Noise reduction 
via selective data pooling and simultaneous search over 
points in a data pool, preventing bias. 

• Maximal avoidance of arbitrary parameters, and thus 
10 insensitivity to great variation in input data. The small 

momber of parameters that are regnaired by the underlying 
statistical model are fit to the observed data, essentially 
using the data set as its own internal control . 

C- • Flexibility for handling multiple data types. 

Il5 Essentially, only probability distribution calculations, 

described below, need to be calibrated to new data types . We 
expect that the invention may be applied to GBA, OLA, ASH, 

1" and RAPD-type markers. 

"==^ Our current embodiment of the software is implemented 

-20 in portable ANSI C, for easy integration into a custom 

5 laboratory information system. This code has been 

successfully run on: 

« Macintosh 

♦ Sun 

25 * MS-DOS 

• MS-Windows 

In our current embodiment of the software, a number of 
consistency checks are performed for GBA data verification, 
using both the raw GBA values and the control wells . 
3 0 Overall statistics for trend analysis and QC are computed. 

Brief "Genotype Reports" are generated, summarizing results 
for each data set, including failures. All data are output 
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in a convenient form for import into interactive statistical 
packages, such as DataDesk™. The current implementation is 
presently restricted to 2-allele tests in diploids - the 
situation with present GBA applications. 

Referring to Fig. 1, there is shown a preferred 
embodiment of a device in accordance with the present 
invention. The device includes an optical detector 11 to 
produce reaction values resulting from one or more 
reactions. These reactions assay for one or more alleles in 
samples of genetic material. We have implemented the 
detector 11 using bichromatic microplate reader model 348 
,and microplate stacker model 33 from ICN Biomedical, Inc., 
P.O. Box 5023, Costa Mesa, California 92626. The microplates 
are in a 96 well format, and the reader accommodates 2 0 
microplates in a single processing batch. Accordingly the 
device of this embodence permits large batch processing. The 
reactions in our implementation use GBA, as described above. 
The detector 11 is controlled by computer 12 to cause 
selected readout of reaction values from each well. The 
computer 12 is programmed to allow for multiple readout of 
the reaction value from a given well over a period of time. 
The values are stored temporarily in memory and then saved 
in database 14. Computer 13 accesses the database 14 over 
line 15 and processes the data in accordance with the 
procedure described below. Of course, computers 12 and 13 
and database 14 may be implemented by an integral controller 
and data storage arrangement. Such an arrangement could in 
fact be located in the housing of the optical detector 11 . 
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5 In Fig . 2 is shown the procedure followed by computer 

13. The steps of this procedure are as follows: 

Input Data: A set of data is loaded under step 21. In 
most applications, each experiment in the set should be 
testing (i) the same genetic marker, and (ii) the same set 
10 of alleles of that marker, using comparable biochemistry 

( e.g. the same reagent batches , etc . ) . Large data sets help 
smooth out noise, although the appropriate size of a data 
set depends on the allele frequencies (and thus the number 
m of expected individuals of each genotypic class) . Each data 

yl5 point in the input data may be thought of as an N- tuple of 

5* numeric values, where N is the number of signals collected 

SI' from each DNA sample for this locus. (N will usually be the 

number of alleles tested at this marker, denoted A, except 
J=?! when repeated testing is used, in which case N may be 

N-20 greater than A) . 

g Preprocess Data: Next the data are subject to 

M preprocessing (step 22) . An internal M-dimensional Euclidean 

representation of the input signals is produced, where each 
input datum (an N-tuple) is a point in M-space. Usually, M 
25 will be the same as N and the coordinates of the point will 

be the values of the input tuple, and thus the preprocessing 
will be trivial (although see the first paragraph of 
variations discussed) . The Euclidean space may be 
non-linear, depending on the best available models of signal 
30 generation. (Completely mathematically equivalently , any 

non-linearity may be embodied in the initial probability 
distributions, described below.) 
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Fig. 3 illustrates preprocessed reaction values from 
step 22 for GBA locus 177-2 on 80 DNA samples. The X-axis 
indicates preprocessed reaction values for allele 1 (A) and 
the Y-axis indicates preprocessed reaction values for allele 
2 (T) . For clarity, the results of genotype determination 
are also indicated for each point: Triangles are TT 
genotype, diamonds are AA, circles are AT, and squares are 
failures (no signal) . 

Probability Distributions: Returning to Fig. 2, under 
step 22, initial probability distributions are established 
for the G possible genotypes. For example, in a random 
diploid population containinq A tested alleles: 

G =(A) + (A - 1) + . . . + 1 = A(A -+ 1) (1) 

2 

The initial conditional probability for any hypothetical 
input datum (a point in M-space, denoted X^) and genotype 
(denoted g) is defined as the prior probability of seeing 
the signal assuming that g is the correct genotype of 
that datum. That is: 

Pr{signal Xjj_ • Genotype = g) , 
where Xj|^ = (xl ... xM) and g • {1 . . . G} (2) 

Figures 4 through 7 illustrate the initial probability 
distributions established for the data in figure 3 . 
Probability distributions are indicated for the four 
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genotypic classes of interest, AA, AT, TT and No Signal, in 
Figs 4, 5, 6, and 7 respectively. The shading at each XY 
position indicates probability, with darker shades 
indicating increased probability for hypothetical data 
points with those X and Y reaction valves. 

Exactly where these distributions come from is highly 
specific to the nature of the input data. The probability 
distributions can either be pre-computed at this step and 
stored as quantized data, or can be calculated on the fly as 
needed in step 23, below. The probability distributions may 
be fixed, or may be fit to the observed data or may be fit 
to assumed genotypes as determined by previous iterations of 
this algorithm. (See Additional Features below. ) 

Under step 23 , we compute the conditional probability 
of each genotype. For each datum X^, the above probabilities 
are collected into an overall conditional posterior 
probability of each genotype for that datum: 

Pr{ Genotype = g ] Signal X^ ) = 

Pr (Signal X ^ ) | Genotvpe= q) -Pr (Genotype = a) ( 3 ) 
Pr (Signal X^) 

where 

Pr( Genotype = g) is the prior probability of any datum 
having genotype g; 

Pr( Signal Xj_) is the prior probability of the signal (a 
constant which may be ignored) ; and 
Pr( Signal Xj_) • Genotype = g) is the initial 
probability defined above . 
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Under step 24, we determine the select the genotype and 
compute the confidence score . For each datxam, using the 
above posterior probabilities, we determine the most likely 
genotype assignment g' (the genotype with the highest 
posterior probability) and its confidence score. The 
confidence score C is simply the loa of the odds ratio: 

C = log^Q Pr (Genotype = g' | Signal X .j_) 

E Pr (Genotype = g • Signal X^) (4) 

Genotypes g 

It should be noted that this procedure is significant, among 
other reasons, because it permits determining a robust 
probabalistic confidence score associated with each geno 
type determination. 

Under step 25, there may be employed adaptive fitting. 
A classic iterative adaptive fitting algorithm, such as 
Estimation-Maximization (E-M) , may be used to increase the 
ability to deal with highly different input data sets and 
reduce noise sensitivity. In this case, the genotypes 
computed in step 24 are used to refit the distributions 
(from step 22) . In step 25, a convergence test is performed, 
which may cause the program to loop back' to step 23 , but now 
using the new distributions . 

As one example, an E-M search procedure may be used to 
maximize the total likelihood, that is, to find the 
maximally likely set of genotype assignments given the input 
data set. (The net likelihood may be calculated from the 
Baysean probabilities, defined above.) For appropriate 
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likelihood calculations and probability distributions, the 
EM principle will guarantee that this algorithm always 
produces true-maximTim- likelihood values, regardless of 
initial guess, and that it always converges. 

Output Data: Under step 26, we output the results 
(genotypes and confidence scores) to the user or to a 
computer database. An example of such output is shown in 
Fig. 8. 

Additional Features 

Additional features may be incorporated into the above 
procedure. They may be integrated into the procedure either 
together or separately, and have all been implemented in a 
preferred embodiment. 

Preprocessing: During steps 21 or 22, the data (either 
input tuples or spatial data points) may be preprocessed in 
order to reduce noise, using any one of many classical 
statistical or signal-processing techniques. Control data 
points may be used in this step. In fact, various types of 
signal filtering or normalizing may be applied at almost any 
step in the algorithm. 

Fitting Probability Distributions: The probability 
distributions calculated in steps 22 and 23 may be fit to 
the input data - that is, each distribution may be a 
function of values which are in part calculated from the 
input data. For example, we may define the conditional 
probability of a signal point for some genotype to be a 
function of the distance between that point and the observed 
mean for that signal . 
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Using an Initial Genotype Guess: In step 22, either a 
simple or heuristic algorithm may be used to produce an 
initial genotype guess for each input data point. If a 
fairly accurate guess can be produced, then the probability 
distributions for each genotype may be fit to the sxibset of 
the data assumed to be of that genotypic class. Another use 
of a genotype guess is in initial input validity checks 
and/or preprocessing {e.g. Step 22), before the remainder of 
the algorithm is applied. To be useful, a guess need not 
produce complete genotypic information, however. 

Using a Null Genotypic Class: In steps 22 and all 
further steps, one (or more) additional probability 
distributions may be added to fit the data to the signals 
one would expect to see if an experiment (e.g. that datum) 
failed. E . cr . , 



The current implementation above is presently 
restricted to M=2 and N=2*R, where R is the number of 
repeated tests of both alleles. We refer to the two alleles 
as X and Y. The program understands the notion of "plates" 
of data, a number of which make up a data set. 

The Initial Guess Variation is employed to initially 
fit distributions using the heuristic described below. The 
Initial Guess is produced during the Preprocessing Step 
which normalizes and background subtracts the input data, 
and remove apparent outlier points as well. These steps are 
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5 performed separately for each allele's signal (i.e., 1 

dimensional analysis) . In fact, this preprocessing is 
applied separately to each of the R repeated tests, and the 
test with the small total 2 dimension residual is chosen for 
use in further steps. Various other preprocessing and 
10 post-processing steps are employed for GBA data validation 

and QC. In particular, controls producing a known reaction 
value may be employed to assure integrity of the biochemical 
process. In a preferred embodiment, signals are assumed to 
J) be small positive numbers (between 0.0 and 5.0, with 0.0 

indicating that allele is likely not present in the sample, 
ffl and larger values indicating that it may be. 

\| To handle a wide range of input data signal strengths, 

'f* the Adaptive Fitting Variation is employed. However, the 

O program is hard-coded to perform exactly one or two 

|30 interactions passes through step 25, which we find works 

5; well for existing GBA data. 

C'i The probability distributions we fit at present in 

steps 22 and 25 have as their only parameters (i) the ratio 
of the X and Y signals for heterozygotes , and (ii) the 

25 variance from the normalized means (0.0 negative for that 

allele, 1.0 for positive for that allele) along each axis 
separately. In fact, these later numbers are constrained to 
be at least a fixed minimum, which is rarely exceeded, so 
that the algorithm will work with very small quantities of 

30 data and will produce the behavior we want. These numbers 

are computed separately for each microtiter plate. The 
probability distributions are generated using the code 
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5 (written in C) attached hereto and incorporated herein by 

reference as Appendix A. 

The Null-Class variant is used to provide genotypic 
class indicating No Signal . 

Quality control may also be enhanced in a surprising 
10 manner using the procedures described here. In particular, 

the confidence score C of equation (4) serves as a robust 
indicator of the performance of the biochemical reaction 
system. For example, a downward trend in the confidence 
scores within a single batch or in successive batches may 
mi5 indicate deterioration of an important reagent or of a 

gi sample or miscalibration of the instrumentation, 

n Accordingly, in a preferred embodiment, the computer 

HI may be used to determine the presence of a downward trend in 

p the confidence score over time calculated in reference to 

/^20 each of the following variables: the locus (is there a 

HI downward trend in the confidence score of a single locus 

ft relative to other loci tested?) , the sample (is there a 

downward trend in the confidence score of a single sample 
relative to other samples tested?) , plate (is there a 
25 downward trend in the confidence score of this plate 

relative to other plate?) , and batch (relative to other 
batches) . If a downward trend of statistical significance 
(using, for example a chi square test) is detected, an alarm 
condition is entered. 
3 0 Because the confidence score is an accurate indication 

of the reliability of the reaction system and the genotype 
determination, a low confidence score associated with a 
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given determination is taken as indicating the need for 
retesting. 
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APPENDIX A 

/* The probability distributions in Figures 4, 5, 5, and 7, respectively 

correspond to the values of xx_prob, xy_prob, YY_prob, and ns_prob, for 
all possible values of the preprocessed reaction values (x_val and y val) 
an the range of interest (0 . 0 to 3 . 0) . */ ^- ' 

/* We assume that the following global variables are set... */ 
double x_pos_mean, x_neg_mean, y_pos_itjean , y_neg_mean ,- 
double x_val, y_val; 



/* And we set the following globals . . . */ 

double xx_prob, xy_prob, yy_prob, ns_prob; 

#define POS_VARIANCE o.25 

#define POS_VARIANCE_INCREMENT 0.00 

#define NEG_VARIANCE o.05 

#define NEG_VARIANCE_INCREMENT 0.10 

#define HET_VARIANCE o.iQ 

fdefine HET_VARIANCE__INCREME1NIT 0.20 



ffdefine COND_NEG_PROB { val , glven_val , val_inean ) \ 

nonnal_prob ( val_mean-val ,NEG_VARIANCE NEG_VARIANCE_INCREMENT*glven_val } 

tdefina COND_HET_PROB(val,glven_val) \ 

normal_prob(given_val-val,HET_VARlANCE + HET_VARIANCE_INCREMENT) 

double normal_prob (deviation, sigma) 

double deviation, sigma; 

{ 

double val=exp(- (deviation* deviation) / (2 . 0*sigma*sigina) ) ; 
return (val>=TINY_PROB ? val : TINy_PROB) ; 



void compute probs() 
{ 

double x_pos_prob, y_pos_prob, x_neg_prob, y_neg_prob; 

x_pos_prob=normal_prob ( (x_pos-mean-x_val) , POS_VARIANCE) ; 

x_neg_prob=normal_prob{ (x_neg_mean-x_val) , NEG_VARIANCE ) ; 

y_pos_prob=normal_prob { {y_pos_m6an-y_val ) , POS_VARIANCE ) ; 

Y_neg_prob=normal_prob ( (y_n6g_mean-y_val) , NEG_VARIANCE) ; 

ns__prob=max (x_n6g_prob * COND_NEG_PROB (y_val , x_val , y_neg_inean ) , 
y_n.eg_Prob * COND_NEG_PR0B (x_val,y_val,x_neg_mean) ) ; 

xx_prob=x_pos_prob * COND_NEG_PROB(y_val,x_val, y_neg_mean) ; 

yy_prob=y_pos_prob * COND_NEG_PROB{x_val,y_val, x_neg_inean) ; 

xy_^rob= max(x_pos_prob * COND_HET_PROB (y_val ,x_val) , 
y_j50s_prob * COND_HET_PROB (x_val , y_val) ) ; 
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What is claimed is : 

1. A method of determining the genotype at a 
locus within genetic material obtained from a biological 
sample, the method comprising: 

A. reacting the material at the locus to produce 
a first reaction value indicative of the presence of a given 
allele at the locus ; 

B. forming a data set including the first 
reaction value; 

C. establishing a distribution set of 
probability distributions, including at least one 
distribution, associating hypothetical reaction .values with 
corresponding probabilities for each genotype of interest at 
the locus ; 

D. applying the first reaction value to each 
pertinent probability distribution to determine a measure of 
the conditional probability of each genotype of interest at 
the locus ; and 

E. determining the genotype based on the data 
obtained from step (D) . 

2. A method according to claim 1, wherein the 
distribution set includes a plurality of probability 
distributions for a corresponding plurality of genotypes of 
interest . 

3. A method, according to claim 1, further 
comprising : 
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5 (i) reacting the material at the locus to produce 

a second reaction value independently, indicative of the 
presence of a second allele at the locus; 

{ii) forming a second data set including the 
second reaction value; and 
10 (iii) applying the first and second reaction 

values to each pertinent distribution to determine a measure 
of the conditional probability of each genotype at the 
^ locus . 

yl5 4. A method according to claim 2, further 

W coinprising: 

M (i) reacting the material at the locus to produce 

j''' a second reaction value; 

^! (ii) applying the first and second reaction values 

ISO to each pertinent distribution to determine the probability 

of . each genotype at the locus; and 

(iii) applying the first and second reaction values 
to each pertinent distribution to determine a measure of the 
conditional probability of each genotype at the locus . 

25 

5 . A method according to claim 3 , wherein each 
probability distribution associates a hypothetical pair of 
first and second reaction values with a single probability 
of each genotype of interest. 



6 . A method according to claim 4 , wherein each 
probability distribution associates a hypothetical pair of 
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first and second reaction values witK a single probability 
of each genotype of interest . 



7. A method according to claim 1, wherein: 
step (B) includes the step of including in the 
10 data set other reaction values obtained under conditions 

comparable to those under which the first reaction value was 
produced; and 

step (C) includes the step of using the reaction 
ifl values in the data set to establish the probability 

riS distributions; the method further comprising: 

£|l , performing steps (D) and (E) with respect to each 

of the reaction values. 

Cl 8. A method according to claim 2, wherein: 

L-Jo step (B) includes the step of including in the 

data set other reaction values obtained under conditions 
O comparable to those under which the first reaction value was 

produced; and 

step (C) includes the step of using the reaction 
25 values in the data set to establish the probability 

distributions; the method further comprising: 

performing steps (D) and (E) with respect to each 
of the reaction values. 
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9. A method according to claim 3, wherein: 
step (B) includes the step of including in the 
data set other reaction values obtained under conditions 



comparable to those under which the first reaction value wa, 
produced; and 

Step (C) includes the step of using the reaction 
values in the data set to establish the probability 
distributions; the method further comprising: 

performing steps (D) and (E) with respect to each 
of the reaction values in the first and second data sets. 

10. A method according to claim 4, wherein: 
step (B) includes the step of including in the 

data set other reaction values obtained under conditions 
comparable to those under v^^hich the first reaction value was 
produced; and 

step (C) includes the step of using the reaction 
values in the data set to establish the probability 
distributions; the method further comprising: 

performing steps (D) and (E) with respect to each 
of the reaction values in the first and second data sets. 

11. A method, according to claim 1, of 
determining the genotype at a locus within genetic material 
obtained from each of a plurality of samples, the method 
further comprising: 

(1) performing step (A) with respect to the locus 
of material obtained from each sample; 

(2) in step (B) , including in the data set 
reaction values obtained from each sample. 
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12 . A method according to claim 7 , of determining 
the genotype of selected loci within genetic material 
obtained from a sample, the method further comprising: 

(1) performing step {A) at each of the selected 

loci; 

(2) in step (B) , including in the data set 
reaction values obtained from each of the selected loci. 

13. A method according to claim 7, wherein step 
(C) includes: 

(1) establishing a set of initial probability 
distributions that associate hypothetical reaction values 
with corresponding probabilities for each genotype of 
interest at the locus; 

(2) using the initial probability distributions 
to determine measures of the initial conditional probability 
for each genotype at the locus; and 

(3) using the results of step (2) to modify the 
initial probability distributions, so that the modified 
distributions more accurately reflect the reaction values in 
the data set. 

14. A method according to claim 8, wherein step 
(C) includes: 

(1) establishing a set of initial probability 
distributions that associate hypothetical reaction values 
with corresponding probabilities for each genotype of 
interest at the locus; 
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(2) using the initial probability distributions 
to determine measures of the initial conditional probability 
for each genotype at the locus; and 

(3) using the results of step (2) to modify the 
initial probability distributions, so that the modified 
distributions more accurately reflect the reaction values in 
the data set . 

15- A method according to claim 9, wherein step 
(C) includes: 

(1) establishing a set of initial probability 
distributions that associate hypothetical reaction values 
with corresponding probabilities for each genotype of 
interest at the locus; 

(2) using the initial probability distributions 
to determine measures of the initial conditional probability 
for each genotype at the locus; and 

(3) using the results of step (2) to modify the 
initial probability distributions, so that the modified 
distributions more accurately reflect the reaction values in 
the data set . 

16. A method according to claim 10, wherein step 
(C) includes: 

(1) establishing a set of initial probability 
distributions that associate hypothetical, reaction values 
with corresponding probabilities for each genotype of 
interest at the locus; 
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(2) using the initial probability distributions 
to determine initial conditional probabilities for each 
genotype at the locus; and 

(3) using the results of step (2) to modify the 
initial probability distributions, so that the modified 
distributions more accurately reflect the reaction values d 
the data. 



17. A method according to claim 13, wherein step 
(C) further includes: 

(4) repeating steps (1) through (3) a desired 
number of times. 



18. A method according to claim 14, wherein step 
(C) further includes: 

(4) repeating steps (1) through (3) a desired 
number of times. 

19. A method according to claim 15, wherein step 
(C) further includes: 

(4) repeating steps (1) through (3) a desired 
number of times . 



20. A method according to claim 16, wherein step 
(C) further includes: 

(4) repeating steps (1) through (3) a desired 
number of times . 



21. A method according to claim 1, wherein step 
(E) further includes the step of calculating a confidence 
score, associated with the genotype being determined, based 
on data obtained from step (D) . 

22. A method according to claim 3, wherein step 
(E) further includes the step of calculating a confidence 
score, associated with the genotype being determined, based 
on data obtained from step (D) . 

23. A method according to claim 7, wherein step 
(E) further includes the step, of calculating a confidence 
score, associated with the genotype being determined, based 
on data from step (D) , the method further comprising (F) 
determining whether a significant downward trend in 
confidence scores has occurred, and, in such event, entering 
an alarm condition. 

24. A method according to claim 9, wherein step 
(E) further includes the step of calculating a confidence 
score, associated with the genotype being determined, based 
on data from step (D) , the method further comprising (F) of 
determining whether a significant downward trend in 
confidence scores has occurred, and, in such event, entering 
ah alarm condition. 

25. A method according to claim 1, wherein each 
allele is a single specific nucleotide. 



26. A method according to claim 4, wherein each 
allele is a single nucleotide. 

27. A method according to claim 1, wherein each 
allele consists of at least two specific nucleotides. 

28. A method according to claim 4, wherein each 
allele consists of at least two specific nucleotides. 

29. A method according to claim 1, wherein each 
allele is defined at least in part by its length in 
nucleotides. 

30. A method according to claim 4, wherein each 
allele is defined at least in part by its length in 
nucleotides . 

31. A method according to claim 1, wherein each 
allele is defined by one of the presence and absence of at 
least one restriction site. 

32. A method according to claim 4, wherein each 
allele is defined by one of the presence and absence of at 
least one restriction site. 
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33. A method according to claim 4, wherein step 
(B) includes the step of including in the data set reaction 
values from prior tests at the locus obtained under 
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comparable conditions . 

34. A method according to claim 12, wherein the 
loci are selected on the basis of their ability to 
discriminate among sxibjects. 

35. A method, according to claim 3, wherein the 
step A of reacting the material involves using a different 
reaction from that of step A and the second allele is 
different from the given allele. 

36. A method according to claim 1, wherein step , 
(A) includes the step of assaying for the given allele using 
genetic bit analysis . 

37. A method according to claim 1, wherein step 
(A) includes the step of assaying for the given allele using 
hybridization . 

38. A method, according to claim 1, wherein step 
(A) includes the step of assaying for the given allele using 
allele-specif ic amplification. 

39. A method, according to claim 1, wherein step 
(A> includes the step of assaying for the given allele using 
a polymerase chain reaction. 



40. A method, according to claim 1, wherein step 
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(A) includes tlie step of assaying for the given allele using 
a ligase chain reaction. 



41. A method according to claim 12, wherein the 
loci are proximal to one another, so that the set of 
genotypes so produced may indicate a sequence of nucleotides 
associated with the genetic material . 

42. A method of determining the genotype -of a 
subject, the method comprising: 

A. reacting genetic material taken from the 
subject at selected loci, each locus being an identified 
single nucleotide, to produce with respect to each of the 
selected loci a reaction value indicative of the presence of 
a given allele at each of the selected loci; 

B. using the reaction values to determine the 
genotype of the subject and a confidence score, associated 
with the genotype being determined. 

43. A method according to claim 42, wherein the 
loci are selected to provide information pertaining to 
inheritance of a trait. 

44. A method according to claim 42, wherein the 
loci are selected to provide information pertaining to 
parentage of the subject. 



45. A method according to claim 42, wherein the 
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loci are selected to provide information pertaining to the 
identity of the subject. 



46. A method according to claim 42, wherein the 
loci are selected to provide information pertaining to 

10 matching tissue of the subject with that of a donor. 

47. A method according to claim 42, wherein the 
loci are spaced throughout the entire genome of the - subject 
to assist in characterizing the genome of the species of the 

&5 subject. 
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